Scalable Processors in the Billion-Transistor Era: IRAM

نویسندگان

Christoforos E. Kozyrakis

Stylianos Perissakis

David A. Patterson

Thomas E. Anderson

Krste Asanovic

Neal Cardwell

Richard Fromm

Jason Golbus

Benjamin Gribstad

Kimberly Keeton

Randi Thomas

Noah Treuhaft

Katherine A. Yelick

چکیده

T he importance of an efficient memory system is increasing as fabrication processes scale down, yielding faster processors and larger memories. This trend widens the processor-memory gap. Not long ago, off-chip main memory was able to supply the CPU with data at an adequate rate. Today, with processor performance increasing at a rate of about 60 percent per year and memory latency improving by just 7 percent per year, 1 it takes dozens of cycles for data to travel between the CPU and main memory. Designers are investing vast amounts of chip resources to bridge this gap. An increasing fraction of the area budget within microprocessor chips is devoted to static RAM (SRAM) caches. For instance, almost half of the die area in the Digital Alpha 21164 is occupied by caches, used solely for hiding memory latency. This cache memory is just a redundant copy of information that would not be necessary if main memory had kept up with processor speed. Still, some applications show poor locality, resulting in low performance even with large caches. 2 Other latency tolerance techniques include combining large caches with some form of out-of-order execution and speculation. Yet this is not an efficient solution for the future, as it requires a disproportionate increase in chip area and complexity. Consider, for example, the MIPS R5000 and R10000 processors. The first is a simple RISC processor, while the second is a complex, out-of-order, speculative one. The R10000 takes 3.43 times more area than the R5000, but its performance, as measured by its SPECint95 peak rating, is only 1.64 times higher. Other architecture alternatives, like wide superscalar and VLIW (very long instruction word), suffer from drawbacks—implementation complexity, low utilization of resources, and immature compiler technol-ogy—or deliver only modest performance improve– ments. Moreover, they usually exacerbate the main memory bandwidth bottleneck. 3 Beyond the uniprocessor, the possibility exists for the integration of multiple processors on a single die, but this integration would place even greater demands on the memory system. For any given die size, putting more processors on a die will result in less on-chip memory for each, thus increasing the number of slow, off-chip memory accesses. In general, increasing the computing resources without a corresponding increase in the on-chip memory will lead to an unbalanced system. Functional units will often be starved for data because of the high latency and limited bandwidth to and from off-chip memory. …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

High-Throughput Switch-Based Interconnect for Future SoCs

System on Chip (SoC) design in the forthcoming billion-transistor era will involve the integration of numerous heterogeneous semiconductor intellectual property (IP) blocks. The success of this approach depends on the seamless integration of cores like processors, memories, UARTs, etc. Some of the main problems in future SoC designs arise from non scalable global wire delays, failure to achieve...

متن کامل

Parallel Media Processors for the Billion-Transistor Era

This paper describes the challenges presented by singlechip parallel media processors (PMPs). These machines integrate multiple parallel function units, instruction execution, and memory hierarchies on a single chip. The combination of programmability and high performance on data parallelism is necessary to meet the demands of nextgeneration multimedia applications. Many research issues must be...

متن کامل

Exploring Core Designs for Chip Multiprocessors

The era of billion transistor chips is fast approaching, and the emerging trend is to use these transistors to integrate multiple processors onto a single chip. In this paper, we explore the core design for a chip multiprocessor (CMP). We have found that out-of-order cores provide better absolute performance than in-order cores, both for commercial and scientific workloads. In general, it takes...

متن کامل

A New Direction for Computer Architecture Research

In this paper we suggest a different computing environment as a worthy new direction for computer architecture research: personal mobile computing, where portable devices are used for visual computing and personal communications tasks. Such a device supports in an integrated fashion all the functions provided today by a portable computer, a cellular phone, a digital camera and a video game. The...

متن کامل

How to build scalable on-chip ILP networks for a decentralized architecture

The era of billion transistors-on-a-chip is creating a completely different set of design constraints, forcing radically new microprocessor architecture designs. This paper examines a few of the possible microarchitectures that are capable of obtaining scalable ILP performance. First, we observe that the network that interconnects the processing elements is the critical design point in the micr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Computer

دوره 30 شماره

صفحات -

تاریخ انتشار 1997

Scalable Processors in the Billion-Transistor Era: IRAM

نویسندگان

چکیده

منابع مشابه

High-Throughput Switch-Based Interconnect for Future SoCs

Parallel Media Processors for the Billion-Transistor Era

Exploring Core Designs for Chip Multiprocessors

A New Direction for Computer Architecture Research

How to build scalable on-chip ILP networks for a decentralized architecture

عنوان ژورنال:

اشتراک گذاری